New Adaptation Techniques for Large Vocabulary Continuous Speech Recognition
نویسندگان
چکیده
This paper proposes several new speaker adaptation techniques to improve the large vocabulary continuous speech recognition accuracy. These include, discriminative adaptation, state-quality measure based adaptation, and N-best hypothesis based adaptation schemes. We propose to incorporate the MMIE criterion in the computation of the posterior counts from the adaptation data. We present a new measure, the state quality measure, to evaluate the quality of a HMM state and subsequently use it for selecting good segments of speech during unsupervised adaptation and as a confidence measure during decoding/rescoring. The state quality measure is the confidence associated with the acoustic model’s ability to predict the HMM state correctly. It is estimated from the correct and decoded set of transcriptions and is used in conjunction with N-best hypotheses for weighting the state occupancy counts during adaptation. In conjunction with the adaptation schemes, we also present the Viterbi algorithm to estimate the HMM state occupancy counts instead of the Forward-Backward algorithm in order to obtain speed ups without degradation in accuracy. Our results on an in-house spontaneous speech task show improvements in the range of 4% to 14% relative for each of the presented techniques.
منابع مشابه
Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملSpeaker-independent upfront dialect adaptation in a large vocabulary continuous speech recognizer
Large vocabulary continuous speech recognition systems show a signi cant decrease in performance if a users pronunciation di ers largely from those observed during system training. This can be considered as the main reason why most commercially available systems recommend| if not enforce | the individual end user to read an enrollment script for the speaker dependent reestimation of acoustic mo...
متن کاملMLLR method for Environmental Adaptation in a Continuous Farsi Speech Recognition
In this paper, MLLR adaptation of continuous density HMM is investigated in a Farsi speaker independent large vocabulary continuous speech recognition system in attempt to improve recognition rate in real world situations. In the MLLR framework, we have experienced the use of Gaussian mean transformations in global adaptation and regression tree based adaptation. Besides full and block-diagonal...
متن کامل